Survival analysis is described as collection of statistical methods for which the response variable of interest is time until an event occurs. In this context, the time can be days, week, months and years from the beginning of follow-up of an individual until an event occurs, or the age of an individual when the event occurs. Moreover, the event can be death, disease, remission, recovery or any experience of interest that may occur to an individual. A more detailed information can be found in Kleinbaum and Marubini and Valsecchi.
Here we developed an easy-to-use, up-to-date, comprehensive and interactive web-based tool for survival analysis. This tool includes analysis procedures for life table, Kaplan-Meier and Cox regression. Each procedure includes following features:
Life table: descriptive statistics, life table, median life time, hazard ratios and comparison tests including Log-rank, Gehan-Breslow, Tarone-Ware, Peto-Peto, Modified Peto-Peto, Flemington-Harrington.
Kaplan-Meier: descriptive statistics, survival table, mean and median life time, hazard ratios, comparison tests including Log-rank, Gehan-Breslow, Tarone-Ware, Peto-Peto, Modified Peto-Peto, Flemington-Harrington, and interactive plots such as Kaplan-Meier curves and hazard plots.
Cox regression: coefficient estimates, hazard ratios, goodness of fit test, analysis of deviance, save predictions, save residuals, save Martingale residuals, save Schoenfeld residuals, save dfBetas, proportional hazard assumption test, and interactive plots including Schoenfeld residual plot and Log-Minus-Log plot.
Regularized Cox regression: variable selection and coefficient estimations using ridge, elastic net and lasso penalties.
Random survival forests: individual survival and cumulative hazard predictions using random survival forests, and interactive plots including, survival (with OOB), hazard (with OOB), error rate vs number of tree and cox regression vs random survival forest model.
This tool requires a dataset in *.txt format, which is seperated by comma, semicolon, space or tab delimiter. First row of dataset must include header. When the appropriate file is uploaded, the dataset will be appear immediately on the main page of the tool. Alternatively users can upload one of the example datasets provided within the tool for testing and understanding the operating logic of the tool.
Kaplan-Meier is a non-paranetric statistical method that is used to estimate survival probabilities and hazard ratios for a cohort study group. In clinical trials, it is often used to measure the part of patients living for a certain period of time after a treatment.
Survival time: Time until an event occurs (i.e. days, weeks, months, years)Status variable: The event (i.e. death, disease, remission, recovery)Category value for status variable: Category value of the event of interest (i.e. 1, yes)Factor variable: A categorical variable which indicates different study groups (i.e. treatment, gender)A Kaplan-Meier analysis can be conducted by applying the following steps:
Kaplan Meier from Analysis tab.survival time, status variable, category value for status variable and factor variable, if exists.Run button to run the analysis.Desired outputs can be selected by clicking Outputs checkbox. Available outputs are;
Summary statistics, such as number and percent of observations, events and censored cases can be obtained.
A survival table can be created. First column in the table represents factor group and number of time points (i.e. 1.2 means second time point in the first factor group, likewise 2.1 means first time point in the second group). Second column is survival time, third column gives number of subjects at risk, fourth column is the number of events, fifth column represents the cumulative probability of surviving, sixth, seventh and eight columns are associated standard error, lower and upper limits, respectively.
A forest plot can be created for each level of factor group using survival probabilites at each end point.
Mean and median life time and their associated confidence levels can be calculated for each level of factor group.
Hazard ratios and their respective lower and upper limits can be calculated for each factor group at each end point.
A forest plot can be created for each level of factor group using hazard ratios at each end point.
Six different comparison tests can be calculated for testing the differences in survival probability estimations between factor groups.
Kaplan-Meier curves can be created. A number of edit options is also available for plots.
Hazard plot can be created. A number of edit options is also available for plots.
Log-Minus-Log plot can be created. A number of edit options is also available for plots.
Cox regression, also known as proportional hazard regression, is a method to investigate the effect of one or multiple factors upon the time an event of interest occurs. In this model, the effect of a unit increase in a factor is multiplicative with respect to the hazard rate.
A Cox regression analysis can be conducted by applying the following steps:
Cox Regression from Analysis tab.survival time, status variable, category value for status variable, and categorical and continuous predictors for the model.interaction terms, strata terms and time dependent covariates can be added to the model. Moreover, if there are multiple records for observations, users can specify it by clicking Multiple ID checkbox. Furthermore, once can choose model selection criteria, as AIC or p-value, model selection method, as backward, forward or stepwise, reference category, as first or last, and ties method, as Efron, Breslow or exact and change the confidence level.Run button to run the analysis.Desired outputs can be selected by clicking Outputs checkbox. Available outputs are coefficient estimates, hazard ratio, goodness of fit tests, analysis of deviance, predictions, residuals, Martingale residuals, Schoenfeld residuals and DfBetas.
A coefficient estimation table, which includes variable names, coefficient estimates and their associated standard errors, z statistics and p values, can be created.
A hazard ratio table, which includes variable names, hazard ratios and their associated lower and upper limits, can be created.
A forest plot can be created for hazard ratios to give them a visual inpection.
Fitted Cox regression model can be tested with three tests: Likelihood ratio, Wald, Score.
A deviance analysis can be conducted for each variable in the fitted model.
Predictions from the fitted model can be obtained.
Residuals from the fitted model can be obtained.
Martingale residuals from the fitted model can be obtained.
Schoenfeld residuals from the fitted model can be obtained.
DfBetas residuals from the fitted model can be obtained.
To check the proportionality assumption of Cox regression model, a proportional hazard test can be conducted both globally and for each variable in the fitted model.
Beside a formal test for proportionality assumption, a Schoenfeld plot can be created to check the assumption visually.
Another useful plot for checking proportionality assumption is log-minus-log plot. Lines should be parallel to each other to satisfy proportionality.
Feature selection is an useful strategy to avoid over-fitting, to obtain more reliable predictive results, and to provide more insights into the underlying casual relationships (Ma and Huang, 2008). In this section, a feature selection can be performed using ridge, elastic net or lasso penalty, especially when there are too many predictors (e.g. n<<p). More information can be found in Zou and Hastie, 2005, Freidman et al, 2008 and Simon et al, 2011.
A Penalized Cox regression analysis can be conducted by applying the following steps:
Penalized Cox Regression from Analysis tab.survival time, status variableSelect All Variables option to include all variables in dataset to the feature selection process. If some predictors categorical and others are continious, then uncheck the Select All Variables option and select categorical and continuous variables seperately.Penalty term slider as follow:Penalty term = 0: ridge penalty 0 < Penalty term < 1: elastic net penalty Penalty term = 1: lasso penalty
Run button to run the analysis.Variable selection is conducted with the selected penalized method (i.e. ridge, elasticnet, lasso) and results will be displayed as a table, which includes selected variables and their associated coefficient estimates.
A cross-validation curve can be created to investigate the relationship between partial likelihood devaince and lambda values.
Random survival forests, an ensemble method for analysing right censored data, first introduced by Ishwaran et al, 2008. RSF has several advantages over Cox regression: (i) Unlike Cox regression, RSF does not rely on proportional hazard assumption. (ii) RSF accounts for nonlinear effects and interactions for factor variables.
A random survival forests analysis can be conducted by applying the following steps:
Random Survival Forests from Analysis tab.survival time, status variable, category value for status variable, and categorical and continuous predictors for the model.interaction terms, strata terms and time dependent covariates can be added to the model. Moreover, if there are multiple records for observations, users can specify it by clicking Multiple ID checkbox. From RSF options, number of tree, bootstrap method, randomly selected number of variable, minimum number of cases in terminal node, maximum depth for a tree, splitting rule, number of split, missing values, number of iterations of the missing data algorithm, proximity of cases, size of bootstrap and type of bootstrap can be adjusted.Run button to run the analysis.Survival predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Out of bag (OOB) survival predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Cumulative hazard predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Out of bag (OOB) cumulative hazard predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
An error rate table, which shows error rate estimations for each tree, can be obtained.
A variable importance table as well as an interactive plot, which shows relative importance of variables in fitted model, can be obtained.
A survival plot can be drawn for survival predictions from random survival forests model. Each line represents a survival curve for each observation.
A survival plot can be drawn for OOB survival predictions from random survival forests model. Each line represents a survival curve for each observation.
A cumulative hazard plot can be drawn for hazard predictions from random survival forests model. Each line represents a survival curve for each observation.
A cumulative hazard plot can be drawn for OOB cumulative hazard predictions from random survival forests model. Each line represents a survival curve for each observation.
An interactive error rate plot, which shows error rate alterations when number of trees increased, can be drawn.
A Cox model can be compared to random survival forests model through an interactive plot for visual inspection of both models.